In this guide, we’ll walk you through the installation of the
BioLizardStyleR package and provide a brief overview of its
functions using a practical example. The BioLizardStyleR
package helps you make ggplot2 plots in the BioLizard style. It includes
an integrated font installer, a dedicated ggplot2 theme, custom color
functions, and a tool to add a BioLizard footer and export the plot.
To install the BioLizardStyleR package directly from
GitHub, you will need to utilize the devtools package. The
package requires our signature font “Lato”. If the font is not yet
installed, installation will be attempted automatically when you load
the library.
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# make sure to look up BioConductor packages (used in the vignette)
setRepositories(ind = c(1:6, 8))
devtools::install_github("lizard-bio/nature-grade-visualization-playground", subdir="BioLizardStyleR")
For building the vignette, you also need the pasilla
dataset. In case of issues with the installation of this package, you
might have to install libbz2 (libbz2-dev (deb), libbz2-devel (rpm),
bzip2 (brew)).
library(BioLizardStyleR)
If automatic font installation fails on your system, you can install the font manually using the following steps:
Transforming your ggplot to match the BioLizard style involves a three-step process. Let’s illustrate this on a basic ggplot.
library(ggplot2)
data("mtcars")
mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), ordered = TRUE)
testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
ggplot2::geom_point(ggplot2::aes(color = gear),size=3) +
ggplot2::labs(title = "Miles per Gallon vs. Horsepower",
x = "Horsepower",
y = "Miles per Gallon",
color = "Gears")
testplot
In the initial step, we implement the biolizard theme through the
lizard_style() function. This not only modifies various
theme settings but also adjusts the font. An important note here is that
the lizard theme can only act as a starting template! Depending on your
specific plot, title length, variables in use, and other factors, you
might need to make further theme adjustments.
testplot <- testplot + lizard_style()
testplot
Tip: you can use
theme_set(lizard_style()) at the beginning of your script
or report to set the theme to lizard_style() in all plots, without
having to add + lizard_style() to each plot individually.
Te reset the theme to the ggplot2 default, you can call
reset_theme_settings().
Once you’ve styled your plot, you can enhance it with a ‘BioLizard’ color palette. You have the option to choose from qualitative, sequential, or divergent palettes tailored for either discrete or continuous variables. These palettes incorporate the distinct hues of the BioLizard house colors, while ensuring they are color-blind friendly and perceptually uniform. For a deeper understanding of each palette, it’s recommended to consult the documentation for the respective function.
The coloring functions within the package adhere to a flexible structure: scale_color/fill_biolizard. The type (discrete/continuous) and the scheme (qualitative/sequential/divergent) must be determined as function parameters.
data("mtcars")
mtcars$gear <- as.factor(mtcars$gear)
# Base plot
testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
ggplot2::geom_point(aes(color = gear), size=3) +
ggplot2::labs(
title = "Miles per Gallon vs. Horsepower",
x = "Horsepower",
y = "Miles per Gallon",
color = "Gears"
)
# Applying the qualitative color scale
testplot_qualitative <- testplot +
scale_color_biolizard(type = "discrete", scheme = "qualitative") +
lizard_style()
testplot_qualitative
# Applying the sequential color scales (assuming you have a different dataset or aesthetic)
## viridis-like scale
testplot_sequential <- testplot +
scale_color_biolizard(type = "discrete", scheme = "l_viridis") +
lizard_style()
testplot_sequential
## sequential scale inspired by BioLizard green
testplot_sequential <- testplot +
scale_color_biolizard(type = "discrete", scheme = "sequential") +
lizard_style()
testplot_sequential
# Applying the divergent color scale (assuming you have a different dataset or aesthetic)
testplot_divergent <- testplot +
scale_color_biolizard(type = "discrete", scheme = "divergent") +
lizard_style()
testplot_divergent
It is possible to reverse the color scales by setting
reverse = TRUE
testplot_divergent <- testplot +
scale_color_biolizard(type = "discrete", scheme = "divergent", reverse = TRUE) +
lizard_style()
testplot_divergent
The discrete color palettes can also be called directly, in case you
would like to use them outside ggplot. The three base colors can also be
used directly as blz_green, blz_blue and
blz_yellow.
#use color palette
mycolors <- biolizard_pal_sequential(3, reverse = TRUE)
gearcolors <- mycolors[as.factor(mtcars$gear)]
graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Lato")
graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")
#use BLZ base colors
mycolors <- c(blz_green, blz_blue, blz_yellow)
gearcolors <- mycolors[as.factor(mtcars$gear)]
graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Lato")
graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")
Note that calling lizard_style() will overwrite the
default colors for points, lines, boxplots, areas, etc.. . To reset them
to the ggplot2 defaults, you can either restart the R session or use
reset_colors().
In the plot below, the points are shown in BioLizard green. Using
reset_colors() will revert these changes:
p <- ggplot2::ggplot(mtcars, ggplot2::aes(mpg, disp)) +
ggplot2::geom_point()
p
reset_colors()
p
x = 1:8
y = 1:8
df <- data.frame(x=x, y=y)
ggplot2::ggplot(df, ggplot2::aes(x=x, y=y, fill = factor(x))) +
ggplot2::geom_col() +
scale_fill_biolizard() +
lizard_style()
ids <- factor(c("1.1", "2.1", "1.2", "2.2", "1.3", "2.3"))
values <- data.frame(
id = ids,
value = c(3, 3.1, 3.1, 3.2, 3.15, 3.5)
)
positions <- data.frame(
id = rep(ids, each = 4),
x = c(2, 1, 1.1, 2.2, 1, 0, 0.3, 1.1, 2.2, 1.1, 1.2, 2.5, 1.1, 0.3,
0.5, 1.2, 2.5, 1.2, 1.3, 2.7, 1.2, 0.5, 0.6, 1.3),
y = c(-0.5, 0, 1, 0.5, 0, 0.5, 1.5, 1, 0.5, 1, 2.1, 1.7, 1, 1.5,
2.2, 2.1, 1.7, 2.1, 3.2, 2.8, 2.1, 2.2, 3.3, 3.2)
)
datapoly <- merge(values, positions, by = c("id"))
ggplot2::ggplot(datapoly, ggplot2::aes(x = x, y = y)) +
ggplot2::geom_polygon(ggplot2::aes(group = id, fill = id)) +
scale_fill_biolizard(type = "discrete", scheme = "sequential") +
lizard_style()
ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
ggplot2::geom_boxplot() +
ggplot2::labs(
title = "Miles per Gallon vs. Gears",
x = "Gears",
y = "Miles per Gallon"
) +
lizard_style()
A violin plot is similar to a boxplot, but captures more information on the data distribution:
ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
ggplot2::geom_violin() +
ggplot2::labs(
title = "Miles per Gallon vs. Gears",
x = "Gears",
y = "Miles per Gallon"
) +
lizard_style()
ggplot2::ggplot(data = mtcars, ggplot2::aes(x = mpg)) +
ggplot2::geom_density() +
lizard_style()
ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
ggplot2::geom_point(ggplot2::aes(color= mpg)) +
ggplot2::geom_smooth() +
ggplot2::labs(
title = "Miles per Gallon vs. horsepower",
x = "Horsepower",
y = "Miles per Gallon",
color = "Miles per Gallon",
fill = "Miles per Gallon"
) +
scale_color_biolizard(type = "continuous", scheme = "l_viridis") +
lizard_style()
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Faceting is a great way to lay out related plots side-by-side, with the axes either on the same scale or on different scales.
# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
ggplot2::geom_point() +
ggplot2::facet_wrap(ggplot2::vars(gear)) +
lizard_style() +
ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))
# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
ggplot2::geom_point() +
ggplot2::facet_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
lizard_style() +
ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))
Note that the minimal style makes it difficult to distinguish the
different blocks from each other. Readability of the figure is improved
by plotting the axes for each plot, but they are removed by
facet_grid and facet_wrap when the plots are
on the same scale. The similar functions facet_rep_grid and
facet_rep_wrap from the lemon package are more suitable
with the lizard style, since they will show all axes.
library("lemon")
# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
ggplot2::geom_point() +
lemon::facet_rep_wrap(ggplot2::vars(gear)) +
lizard_style() +
ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))
# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
ggplot2::geom_point() +
lemon::facet_rep_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
lizard_style() +
ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))
Treemap plots are a powerful way of visualizing hierarchical data such as GO terms, for example. The size of the rectangles corresponds to the value you want to show (e.g. p value), and related rectangles are grouped together. This example makes use of the Pokémon dataset from the highcharter package and is based on https://yjunechoe.github.io/posts/2020-06-30-treemap-with-ggplot/.
In this example, more colors are requested than are present in the
qualitative color scales. The biolizard_pal_hue palette was
developped for situations like this. It is a specific version of the
scales::pal_hue function which is used by default by
ggplot2, that starts at Biolizard’s signature green.
The treemapify package allows plotting treemaps with ggplot2.
library(treemapify)
library(dplyr)
data("pokemon", package = "highcharter")
# Cleaning up data for a treemap
data <- pokemon |>
dplyr::select(pokemon, type_1, type_2) |>
dplyr::mutate(type_2 = ifelse(is.na(type_2), paste("only", type_1), type_2)) |>
dplyr::group_by(type_1, type_2) |>
dplyr::summarise(n = length(pokemon)) |>
dplyr::ungroup()
#> `summarise()` has grouped output by 'type_1'. You can override using the `.groups` argument.
head(data)
#> # A tibble: 6 × 3
#> type_1 type_2 n
#> <chr> <chr> <int>
#> 1 bug electric 4
#> 2 bug fairy 2
#> 3 bug fighting 3
#> 4 bug fire 2
#> 5 bug flying 13
#> 6 bug ghost 1
# plot the treemap
ggplot2::ggplot(data, ggplot2::aes(area = n, label = type_2, fill = type_1,
subgroup = type_1)) +
ggplot2::ggtitle("Number of pokémon stratified by type") +
# 1. Draw type_2 borders and fill colors
treemapify::geom_treemap() +
# 2. Draw type_1 borders
treemapify::geom_treemap_subgroup_border() +
# 3. Print type_1 text
treemapify::geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour = "black",
fontface = "italic", min.size = 0,
family = "Lato") + #need to add "Lato" manually as this text is not accessed in theme()
# 4. Print type_2 text
treemapify::geom_treemap_text(colour = "white", place = "topleft", reflow = T,
family = "Lato") + #need to add "Lato" manually as this text is not accessed in theme()
# 5. BLZ style
scale_fill_biolizard(scheme = "hues") +
lizard_style() +
ggplot2::theme(legend.position = "none")
However, this is not ideal, since the font of the text still has to be defined manually, and thus the lizard_style() function does not work very well for these treemaps, even though they are ggplot2-based. Another option is to use the ‘treemap’ library, and move away from ggplot2 altogether, which generally leads to more aesthetically pleasing plots:
library(treemap)
# get colors from the biolizard hues palette
ncols <- length(unique(data$type_1))
palette <- biolizard_pal_hue(ncols)
treemap::treemap(data, index=c("type_1", "type_2"), vSize="n", type="index",
title="Number of pokémon stratified by type",
palette=palette, #set the BLZ color palette
fontcolor.labels=c("#FFFFFFDD", "#00000080"), bg.labels=0, border.col="#00000080",
fontfamily.title = "Lato", fontfamily.labels = "Lato", fontfamily.legend = "Lato") #set the lato font
Idea from “A ggplot2 Tutorial for Beautiful Plotting in R”.
chic <- utils::read.csv("https://figshare.com/ndownloader/files/42307179") #An example dataset
g <- ggplot2::ggplot(chic, ggplot2::aes(x = season, y = o3,
color = season)) +
ggplot2::labs(x = "Season", y = "Ozone") +
ggplot2::geom_violin(fill = "gray80", linewidth = 1, alpha = .5) +
ggplot2::geom_jitter(alpha = .25, width = .3) +
ggplot2::coord_flip()
g + lizard_style() +
scale_color_biolizard()
library(corrr)
library(forcats)
corm <-
chic |>
dplyr::select(temp, dewpoint, pm10, o3) |>
corrr::correlate(diagonal = 1) |>
corrr::shave(upper = FALSE)
#> Correlation computed with
#> • Method: 'pearson'
#> • Missing treated using: 'pairwise.complete.obs'
corm <- corm |>
tidyr::pivot_longer(
cols = -term,
names_to = "colname",
values_to = "corr"
) |>
dplyr::mutate(
rowname = forcats::fct_inorder(term),
colname = forcats::fct_inorder(colname),
label = dplyr::if_else(is.na(corr), "", sprintf("%1.2f", corr))
)
g <- ggplot2::ggplot(corm, ggplot2::aes(rowname, fct_rev(colname),
fill = corr)) +
ggplot2::geom_tile() +
ggplot2::geom_text(ggplot2::aes(label = label)) +
ggplot2::coord_fixed(expand = FALSE) +
ggplot2::labs(x = NULL, y = NULL)
g + scale_fill_biolizard(type='continuous',scheme='divergent',limits = c(-1, 1),
na.value = "white", name="Pearson\nCorrelation") +
lizard_style() +
ggplot2::theme(legend.position = "inside", legend.position.inside = c(.95, .8))
g <- ggplot2::ggplot(chic, ggplot2::aes(temp, o3)) +
ggplot2::geom_hex(color = "grey") +
ggplot2::labs(x = "Temperature (°F)",
y = "Ozone Level",
title = 'How Temperature Affects Ozone Levels',
subtitle ='Hexagonal Binning of Ozone Levels vs. Temperature in °F')
g <- g + lizard_style() +
scale_fill_biolizard(type = 'continuous',scheme = 'sequential')
g
finalise_lizardplot(g,source="Source: The Chicago dataset (https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/)")
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"
ggplotly can be used to make a ggplot interactive, for
example to use in a html report. See an example of in interactive PCA
plot and volcano plot below. To generate these plots, we used the
per-gene read counts of the pasilla
dataset. Make sure that you do not apply
lizard_style() to the ggplot call when you want to use
ggplotly. The function lizard_layout() applies
the lizard theme to a plotly object.
Note: to succesfully install Pasilla, you might have to manually install libbz2 (libbz2-dev (deb), libbz2-devel (rpm), bzip2 (brew)).
library(pasilla)
library(plotly)
# get gene counts
datafiles <- system.file("extdata", package="pasilla", mustWork=TRUE)
count.table <- utils::read.table(file.path(datafiles, "pasilla_gene_counts.tsv"), header=TRUE, row.names=1, quote="", comment.char="" )
head(count.table)
#> untreated1 untreated2 untreated3 untreated4 treated1 treated2 treated3
#> FBgn0000003 0 0 0 0 0 0 1
#> FBgn0000008 92 161 76 70 140 88 70
#> FBgn0000014 5 1 0 0 4 0 0
#> FBgn0000015 0 2 1 2 1 0 0
#> FBgn0000017 4664 8714 3564 3150 6205 3072 3334
#> FBgn0000018 583 761 245 310 722 299 308
# get metadata
SampleAnno <- utils::read.csv(file.path(datafiles, "pasilla_sample_annotation.csv"))
head(SampleAnno)
#> file condition type number.of.lanes total.number.of.reads exon.counts
#> 1 treated1fb treated single-read 5 35158667 15679615
#> 2 treated2fb treated paired-end 2 12242535 (x2) 15620018
#> 3 treated3fb treated paired-end 2 12443664 (x2) 12733865
#> 4 untreated1fb untreated single-read 2 17812866 14924838
#> 5 untreated2fb untreated single-read 6 34284521 20764558
#> 6 untreated3fb untreated paired-end 2 10542625 (x2) 10283129
# DE analysis with edgeR
library(edgeR)
Group <- as.factor(c(rep("untreated", 4), rep("treated", 3)))
y <- edgeR::DGEList(counts = count.table, group = Group)
# filter genes
keep <- edgeR::filterByExpr(y)
y <- y[keep, , keep.lib.sizes = FALSE]
# normalize counts
y <- edgeR::calcNormFactors(y)
norm.counts <- edgeR::cpm(y, normalized.lib.sizes = TRUE, log = T)
# plot interactive PCA
mds <- limma::plotMDS(y, gene.selection="common", plot = F)
toplot <- data.frame(Dim1 = mds$x, Dim2 = mds$y, Group = Group, sample = colnames(mds$distance.matrix.squared))
pcaPlot <- ggplot2::ggplot(toplot, ggplot2::aes(Dim1, Dim2, colour = Group)) +
ggplot2::geom_point() +
ggplot2::geom_text(ggplot2::aes(label = sample), size = 3, nudge_y = 0.05) +
ggplot2::coord_fixed() +
ggplot2::xlab(paste0("PC1: ", round(mds$var.explained[1]*100), "% variance")) +
ggplot2::ylab(paste0("PC2: ", round(mds$var.explained[2]*100), "% variance")) +
scale_color_biolizard()
plotly::ggplotly(pcaPlot) |> lizard_layout()
# The lizard style is added through lizard_layout() on plotly instead of lizard_style() in ggplot
# fit general linear model
design <- stats::model.matrix(~ 0 + Group)
colnames(design) <- levels(Group)
y <- edgeR::estimateDisp(y, design)
fit <- edgeR::glmQLFit(y, design)
# fit contrasts of interest
my.contrasts <- limma::makeContrasts(treatedVsUntreated = treated-untreated,
levels=design)
lrt <- edgeR::glmQLFTest(fit, contrast = my.contrasts)
DE <- edgeR::topTags(lrt, n="all")$table
colnames(DE) <- c("log2FoldChange", "logCPM", "LR", "pvalue", "padj")
DE <- DE[order(DE$padj), ]
DE <- DE[!is.na(DE$padj), ]
head(DE)
#> log2FoldChange logCPM LR pvalue padj
#> FBgn0039155 -4.605077 5.882320 1032.5624 3.519114e-11 2.786786e-07
#> FBgn0025111 2.906975 6.925429 718.6612 2.001949e-10 7.926717e-07
#> FBgn0029167 -2.189661 8.222828 527.6780 8.769068e-10 2.151196e-06
#> FBgn0035085 -2.549919 5.685708 504.5058 1.086600e-09 2.151196e-06
#> FBgn0003360 -3.171310 8.451316 467.1444 1.567356e-09 2.482378e-06
#> FBgn0039827 -4.149923 4.399320 412.0585 2.861632e-09 3.476204e-06
# interactive volcano plot
p <- DE |>
ggplot2::ggplot(ggplot2::aes(x = log2FoldChange, y = -log10(pvalue))) +
ggplot2::geom_point(ggplot2::aes(color = padj < 0.05)) +
scale_color_biolizard(reverse = TRUE)
plotly::ggplotly(p) |> lizard_layout()
note: to install ‘sf’ you might have to install libudunits2 and libgdal:
sudo apt-get update && sudo apt-get install libudunits2-dev libgdal-devdnf install udunits2-devel libgdal-develbrew install udunits &
brew install gdallibrary("sf")
library("rnaturalearth")
library("rnaturalearthdata")
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
BLZ_offices <- c("Belgium", "Netherlands", "Switzerland", "United States of America")
world$BLZ_office <- ifelse(world$name_en %in% BLZ_offices, "yes", "no")
world$BLZ_office <- factor(world$BLZ_office, levels = c("yes", "no"))
ggplot2::ggplot(data = world, ggplot2::aes(fill = BLZ_office)) +
ggplot2::geom_sf() +
scale_fill_biolizard(type = "discrete", scheme = "qualitative") +
lizard_style()
# coloring the ocean, as in the map on our website
ggplot2::ggplot(data = world, ggplot2::aes(fill = BLZ_office)) +
ggplot2::geom_sf() +
ggplot2::scale_fill_manual(values = c(blz_yellow, blz_green)) +
lizard_style() +
ggplot2::theme(panel.background = ggplot2::element_rect(fill = blz_blue),
panel.grid = ggplot2::element_blank())
# europe only
ggplot2::ggplot(data = dplyr::filter(world, continent == "Europe"), ggplot2::aes(fill = BLZ_office)) +
ggplot2::geom_sf() +
ggplot2::scale_fill_manual(values = c(blz_yellow, blz_green)) +
lizard_style() +
ggplot2::theme(panel.background = ggplot2::element_rect(fill = blz_blue),
panel.grid = ggplot2::element_blank()) +
ggplot2::coord_sf(xlim = c(-25, 50), ylim = c(35, 70), expand = FALSE)